BADREX: In situ expansion and coreference of biomedical abbreviations using dynamic regular expressions

نویسنده

  • Phil Gooch
چکیده

BADREX uses dynamically generated regular expressions to annotate term definition–term abbreviation pairs, and corefers unpaired acronyms and abbreviations back to their initial definition in the text. Against the Medstract corpus BADREX achieves precision and recall of 98% and 97%, and against a much larger corpus, 90% and 85%, respectively. BADREX yields improved performance over previous approaches, requires no training data and allows runtime customisation of its input parameters. BADREX is freely available from https://github.com/philgooch/BADREX-Biomedical-AbbreviationExpander as a plugin for the General Architecture for Text Engineering (GATE) framework and is licensed under the GPLv3.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

Corefrence resolution with deep learning in the Persian Labnguage

Coreference resolution is an advanced issue in natural language processing. Nowadays, due to the extension of social networks, TV channels, news agencies, the Internet, etc. in human life, reading all the contents, analyzing them, and finding a relation between them require time and cost. In the present era, text analysis is performed using various natural language processing techniques, one ...

متن کامل

CNIO at BARR IberEval 2017: Exploring Three Biomedical Abbreviation Identifiers for Spanish Biomedical Publications

This paper describes the adaptation and assessment of three stateof-the-art publicly available, widely used, biomedical abbreviation recognition systems developed originally to process English scientific literature. The underlying assumption of using these tools was that abbreviations, and abbreviationdefinition pairs do show similar properties shared by texts written in both languages. The thr...

متن کامل

Annotation of Coreference Relations Among Linguistic Expressions and Images in Biological Articles

In this paper, we propose an annotation scheme which can be used not only for annotating coreference relations between linguistic expressions, but also those among linguistic expressions and images, in scientific texts such as biomedical articles. Images in biomedical domain often contain important information for analyses and diagnoses, and we consider that linking images to textual descriptio...

متن کامل

Comprehension of Coreferential Expressions

The ways in which the form of referring expressions interacts with the structure of language are reviewed. Evidence from a number of different methods – quantitatively analyzed judgments of acceptable coreference, reading time, and corpus frequency of different types of coreferential expressions – converges on a fairly simple description of patterns of coreference. A model is presented which in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1206.4522  شماره 

صفحات  -

تاریخ انتشار 2012